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I.  PURPOSE  OF  THE  PROJECT 


Research  Coward  a  syllable  communication  system,  voice  to  voice  or  voice  to 
print,  utilizing  the  phonetic  typewriter  developed  by  the  RCA  Laboratories  and 
a  speech  synthesizer  which  proceeds  from  pre-recorded  spoken  syllables  Such  a 
system  when  ultimately  developed  should  allow  transmission  of  the  spoken  word 
at  an  extremely  low  rate  such  as  23  bits  per  second  calculated  for  a  language 
of  2000  syllables  and  a  normal  speaking  rate.  A  rudimentary  and  limited  vocabulary 
model  of  such  a  system  has  been  made  available  together  with  other  speech  processing 
equipment  also  developed  at  the  RCA  Laboratories.  The  purpose  of  this  study  is 
to  make  improvements  in  speech  processing  and  to  demonstrate  the  system  with  a 
limited  vocabulary  using  an  assembLy  of  laboratory  apparatus. 


II.  ABSTRACT 


A  limited  and  rudimentary  system  developed  by  the  RCA  Laboratories  is  the 
starting  point  of  this  research.  The  apparatus  assembled  can  analyze  and  synthesize 
50  syllables  or  words  and  has  the  resolving  power  and  memory  capacity  to  print¬ 
out  an  even  larger  number  of  words  selected  for  this  purpose,  when  setup  for  a 
given  speaker.  For  50  words  the  information  for  printout  or  speech  synthesis 
is  transmitted  at  a  channel  capacity  of  6  bits  per  word. 

Investigations  were  made  using  the  assembled  apparatus  to  determine  per¬ 
formance  criteria  by  which  further  progress  could  be  measured.  Tests  were  made 
using  list  of  words  other  than  those  originally  found  suitable  for  processing 
by  machine.  One  series  of  tests  was  made  with  a  phonetically  balanced  list  of 
words  as  used  for  standardized  articulation  testing  such  as  PB-50  List  1.  Other 
tests  were  made  with  the  words  representing  the  current  phonetic  spelling 
alphabet  (alfa,  bravo...)  and  also  with  other  words  chosen  for  their  military 
significance . 

Apparatus  was  assembled  to  investigate  the  usefulness  of  processing  envelope 
information.  Studies  were  made  of  the  rate  of  growth,  rate  of  decay  and  the 
duration  of  intra-syllable  pauses.  Tests  were  made  with  several  thousand 
volcings  by  two  speakers  using  indications  of  two  quantized  levels  each  for  the 
growth  and  decay  to  determine  useful  operating  point.  It  was  found  practical 
to  definitively  classify  words  by  these  characteristics  and  thus  add  to  the 
resolving  power.  Work  will  continue  in  this  area. 
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III.  PUBLICATIONS,  LECTURES,  REPORTS  and  CONFERENCES 


The  basic  work  done  at  the  RCA  Laboratories  is  partly  covered  by  publications 
made  prior  to  work  on  this  contract.  a  lecture  and  demonstration^ 

of  such  prior  work  was  given,  as  previously  scheduled,  and  at  no  cost  to  the 
Government,  to  the  Acoustical  Society  of  America  Meeting  in  New  York  on  24  May  1962. 

Two  conferences  were  held  with  representatives  of  the  USASRDL  at  the  RCA 
Laboratories.  Subjects  discussed  were  the  objectives  of  the  contract  and  the 
means  of  conducting  the  research.  These  conferences  are  fully  covered  by 
Conference  Reports  dated  15  May  and  13  July  1962. 


1.  Harry  F.  Olson,  Acoustical  Engineering,  D.  Van  Nostrand  Company  (1957). 

2.  Harry  F.  Olson  and  11.  Belar,  "Phonetic  Typewriter,"  J.  Acoust.  Soc.  Amer. , 

28,  6,  1072-1081  (November  1956). 

3.  Harry  F.  Olson  and  H.  Belar,  "Phonetic  Typewriter,"  IRE  Trans  on  Audio, 

All -5 ,  4  (July-August  1957). 

4.  Harry  F.  Olson  and  11.  Belar,  "Time  Compensation  for  Speed  of  Talking  in 
Speech  Recognition  Machines,"  IRE  Trans,  on  Audio,  AU-8 .  3  (May-June  1960). 

5.  Harry  F.  Olson  and  H.  Belar,  "Syllable  Analyzer,  Coder  and  Synthesizer  for 
the  Transmission  of  Speech,"  1.961  Proceedings  National  Aerospace  Electronics 
Conference,  NAECON,  1414  E.  Third  St.,  Dayton  3,  Ohio. 

6.  Harry  F.  Olson  and  II.  Belar,  "Syllabic  Analyzer,  Coder  and  Synthesizer  for 
the  Transmission  of  Speech,"  IRE  Trans,  on  Audio,  AU-10 ,  1  (January-February 
1962). 

7.  Harry  F.  Olson  and  H.  Belar,  "A  Print-out  System  for  the  Automatic  Recording 
of  the  Spectral  Analysis  of  Spoken  Syllables,"  J.  Acoust.  Soc.  Amer..,  34^,  2 
(February  1962) . 

8.  Harry  F.  Olson  and  II.  Belar,  "Phonetic  Typewriter  III,"  J.  Acoust.  Soc.  Amer., 
33 .  1.L  (November  1961). 

9.  Harry  F.  Olson,  H.  Belar  and  R.  deSobrino,  "Speech  Processing  Systems," 

J.  Acoust.  Soc.  Amer.,  34,  5  (May  1962). 


IV.  FACTUAL  DATA 


A.  Introduction 

This  contract,  very  briefly  stated,  requires  research  in  speech  processing, 
the  construction  of  equipment  to  complete  the  chain  of  a  syllable  communications 
system  and  certain  performance  tests,  etc.  During  the  interval  between 
submitting  the  proposal  and  the  awarding  of  the  contract  RCA  had  completed 
construction  of  the  syllable  synthesizer  and  other  units,  thus  substantially 
completing  this  portion  of  the  work.  This  work  was  done  entirely  with  RCA 
funds.  Moreover,  during  the  first  month  of  the  period  covered  by  this 
report  RCA  personnel  were  engaged,  also  entirely  at  RCA  expense  in  making 
the  equipment  ready  for  a  demonstration  to  the  Acoustical  Society  Meeting 
in  New  York  as  reported  in  the  First  Monthly  Letter  Type  Report.  Thus,  no 
charges  were  made  against  this  contract  during  the  first  month. 

A  conference  was  held  on  May  15  including  the  Chief  of  the  Voice 
Security  Branch,  Mr.  Martin  Weinstock,  and  the  Project  Engineer,  Mr.  A.  H.  Ross 
as  reported  in  the  Conference  Report  of  that  date.  The  state  of  the.  develop¬ 
ment  was  reviewed.  The  equipment  as  then  setup  was  wired  to  recognize  52 
words  by  one  speaker  (MB)  of  which  about  24  words  could  be  worked  by  a 
second  speaker  (RdS)  and  1.6  by  the  third  speaker  tested  (GMS)  .  The  words 
had  been  chosen  for  purposes  of  demonstration  and  were  selected  from  those 
that  were  found  to  be  recognizable  by  the  machine. 

Mr.  Weinstock  stressed  the  importance  of  doing  further  work  in  order 
to  increase  the  accuracy,  the  tolerance  for  different  speakers  and  the 
vocabulary  to  an  even  greater  extent  than  originally  cabled  for.  The 
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feeling  of  RCA  personnel  was  that,  since  the  construction  of  the  syllable 
synthesizer  and  other  work  was  done  at  the.  RCA  Laboratories  with  RCA  funds 
in  advance  of  the  contract  award,  the  suggested  research  will  be  possible. 
Specific  items  selected  for  further  research  were  characteristics  of 
envelopes,  nonlinear  amplifiers,  etc.  and  the  contractor  was  encouraged  to 
explore  other  possibilities  and  submit  work  programs  that  may  result  in  a 
feasibility  model  with  greater  capacity,  as  for  exampLe,  100  or  even  200 
words.  The  contractor  was  requested  to  add  words  of  military  significance 
and  also  to  make  tests  with  phonetically  balanced  word  lists,  etc.  During 
the  balance  of  the  period  those  requests  and  suggestions  wore  carried  out 
to  the.  extent  described  more  fully  under  the  follow  respective  headings. 

B.  Characteristics  of  Envelope 

Characteristics  of  envelope  that  have  been  found  useful  in  speech 

processing  are  as  follows: 

Growth  and  Decay  Characteristics 
Duration  of  Intrasyllabic  Bauses 
Unbalance  of  Bilateral  Peak  Envelope 

Tests  were  made  using  equipment  on  hand  which  was  assembled  and  suitably 
modified  for  this  work.  The  processing  of  envelope  information  in  various 
frequency  bands  is  performed  with  the  apparatus  housed  in  a  82  inch  relay 
rack.  Tills  equipment  incorporates  two  alternative  ways  of  quantizing 
growth  and  decay  information.  Provisions  are  made  to  add  the  result  of 
this  envelope  processing  to  the  display  obtained  from  the  phonetic 

O 

typewriter  and  the  information  is  automatically  read  out  with  equipment 
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developed  specially  for  this  purpose.^  The  first  method  of  determining  the 
rate  of  growth  and  decay  uses  equipment  developed  in  this  laboratory  before 
(not  published).  It  determines  the  rate  of  growth  or  decay  by  the  rate  of 
change  of  the  rectified  envelope.  This  method  is  described  more  fully  in 
the  next  section.  The  other  method  employs  amplifier  and  relays  and 
measures  the  time  between  reaching  different  levels. 

C .  Crowth  and  Decay  Detection  by  Differentiation  of  the  Rectified  Envelope 
A  functional  schematic  diagram  of  the  Growth  and  Decay  Detector  is 
shown  by  Fig.  1,  from  which  it  can  be  seen  that  the  detection  of  the  envelope 
characteristics  proceeds  in  parallel  with  the  analysis  and  processing  of  the 
spoken  syllable  or  word  by  elements  of  the  phonetic  typewriter.  The 
information  on  growth  and  decay  is  displayed  by  fields  in  the  7th  row  from 
the  bottom  of  the  spectral,  memory  display  which  is  otherwise  a  time  compensated, 
amplitude  normalized,  display  of  quantized  information  of  the  second 
derivative  of  the  spectral  response  versus  time.  The  field  in  column  1 
was  wired  to  indicate  that  the  rate  of  growth  exceeded  a  set  threshold. 

The  field  in  column  5  indicates  the  same  for  decay.  The  other  fields  in 
row  7  are  reserved  for  other  properties  of  envelope  such  as  duration  of 
intrasyllabic  pauses.  The  channel  corresponding  to  row  7,  being  one  of  the 
less  useful  channels,  was  omitted  from  the  spectral  analysis.  Referring 
again  to  Fig.  1  the  signal  from  the  microphone  is  amplified  and  passed 
through  a  100  cycle  hp  filter  to  prevent  very  low  frequency  components  of 
sound  (or  noise)  to  operate  the  delectois.  After  additional  amplification 
the  signal  is  rectified  by  two  rectifiers  of  opposite  polarity  to  obtain  a 
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positive  and  negative  voltage  proportional  to  the  average  amplitude  of  the 
envelope.  The  differentiators  are  connected  to  the  output  of  those  rectifiers. 
The  differentiator  associated  with  the  growth  detector  produces  a  negative 
signal  when  the  negative  rectifier  output  becomes  more  negative  because  the 
input  signal  is  increasing.  The  dc  amplifier  amplifies  negative  signals 
and  completes  a  circuit  when  the  signal  exceeded  a  certain  set  threshold. 

A  decrease  in  input  produces  a  less  negative  voltage  in  the  rectifier  and 
produces  a  positive  indication  in  the  output  of  the  differentiator  during 
the  interval  during  which  the  signal  is  decreasing.  A  positive  signal  has 
no  effect  on  the  dc  amplifier,  thus  it  produces  an  indication  only  when 
the  rate  of  growth  has  exceeded  a  set  amount.  A  similar  determination  is 
made  during  the  period  of  decay  except  that  the  polarity  of  the  input  is 
reversed  so  that  a  decrease  in  input  produces  a  negative  indication  in  the 
output  of  the  differentiator.  A  more  complete  circuit  diagram  of  the 
growth  and  decay  detector  is  given  by  Fig.  2. 

The  time-frequency  analyzers,  the  growth  and  decay  analysis  and  the 
coded  output  read-out  of  information  all  operate  In  real  time. 

The  performance  of  the  growth  and  decay  detector  was  measured,  A  known 
signal  of  knovnrate  of  growth  and  decay  was  applied  in  place  of  the  micro¬ 
phone  output.  Such  a  signal,  is  obtained  by  reproducing  RCA  Laboratories 
Tape  Record  TR-89  which  was  made  for  this  purpose.  The  results  arc  shown 
by  Figs.  3  and  4  which  are  the  calibration  of  growth  and  decay  detection, 
respectively.  It  can  be  noted  from  Fig.  3  that  the  "fast  growth"  is 
indicated  when  the  rate  of  growth  in  the  microphone  output  line  exceeds 
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.009  volts  per  second  provided  a  level  of  not  less  than  .0007  volts  is  reached. 
Similarly  it  can  be  noted  that  the  decay  detector  will  indicate  fast  decay 
when  the  signal  input  decays  at  a  rate  greater  than  .0005  volts.  As  noted 
the  performance  points  shown  in  Figs.  3  and  4  were  obtained  with  the  controls 
adjusted  according  to  a  schedule  designated  as  setup  101.  This  adjustment 
was  used  for  a  number  of  tests  described  in  the  next  section, 

D .  Tests  Made  with  Growth  and  Decay  Detector 

The  following  tests  were  made  with  the  apparatus  described  in  the 
preceding  section: 

Setup  101 

32  voicings  each  of  50  words  PB-50  List  1,  spoken  by  GHS. 

32  voicings  each  of  50  words  PB-50  List  1,  spoken  by  KdS. 

32  voicings  each  of  14  words  of  military  significance,  spoken  by  RdS. 

32  voicings  each  of  26  words  of  the  phonetic  alphabet,  spoken  by  RdS. 

This  total  of  4480  voicings  was  processed  by  machine  and  the  digitized 
analysis  printed  out  on  standardized  data  sheet  forms  for  each  set  of  32 
voicings,  as  shown  for  example  by  Fig.  5  for  the  word  "box"  spoken  by  GMS. 

The  32  ten  letter  codes  typed  in  a  column  on  the  left  of  this  figure  represent 
the  displays  as  readout  by  the.  machine.  It  will  be  noted  that  the  5th, 

7th  and  9th  column  have  printed  the  identical  letter  and  so  do  the  6th,  8th 
and  10th  colunn.  This  is  so,  because  the  spectral  memory  has  for  this  setup 
been  reduced  to  3  time  steps  which  was  accomplished  by  parallel  connections 
of  the  last  three  steps  which  in  effect  makes  the  third  step  an  extra 
long  step.  The  first  two  letters  denote  the  pattern  In  the  first  time 
step,  etc.  as  explained  before. 
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If  all  the  codes  that  are  alike  are  tabulated  and  portrayed  graphically 

as  patterns  in  a  prescribed  grid  of  frequency  and  time  it  is  found  that,  as 

also  shown  by  Fig.  5,  there  are  5  different  presentations  for  the  32  analyses 

of  the  word  "box".  The  pattern  that  appears  most  frequently  appears  19 

times,  thenext  frequent  appears  7  times  and  so  forth.  The  columns  of  the 

pattern  represent  time  steps  moving  from  left  to  right  and  the  rows  represent 

frequency  channels  with  the  lowest  frequency  channel  (250  to  500  cycles  per 

second)  on  the  bottom.  Row  7  is  an  exception  in  that  column  1  in  that  row 

indicates  "fast  growth"  and  column  5  Indicates  "fast  decay"  as  described 

before.  The  summary  on  this  sheet  shows  the  total  number  of  times  each 

field  was  operated  in  the  32  voicings.  From  this  and  a  study  of  the 

individual  codes  a  set  of  displays  can  be  defined  by  an  alternate  higher 

order  code  which  includes  all.  or  most  of  the  codes  of  the  first  kind 

7  8 

following  procedures  described  elsewhere.  ’  For  instance,  Alt.  1  shown 
requires  4  fields  to  always  operate  and  2  fields  to  "either"  operate  or 
not,  and  the  balance  not  operated  as  shown  by  solid  circles,  crosses  and 
blank  squares,  respectively.  Compared  with  the  known  code  it  can  be  seen 
that  the  "fit."  of  this  code  is  31./32  as  il  will  miss  one  display  which 
occurs  one  time  when  it  operates  channel  3  relay  in  the  second  time  step. 

Alt.  3  code  which  was  drawn  primarily  for  illustrations  requires  that  the 
second  channel  in  the  1st  column  Is  also  not  operated.  Were  such  a  code 
chosen  and  the  logic  corresponding  to  it  actually  wired,  it  would  have 
missed  5  of  the  dispLays  actually  attained  and  the  fit  would  be  only  26/32. 
The  graph  on  the  lower  right  hand  side  portrays  the  number  of  new  codes 
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that  were  found  in  successive  voicings  plotted  on  a  semi-log  scale.  From 
the  slope  of  the  line  drawn  through  the  points  it  appears  that  there  is 
approximately  one  new  code  for  every  double  number  of  voicings.  By  extra¬ 
polation  it  can  then  be  estimated  that  the  next  32  voicings  if  made  under 
the  same  conditions  would  include  one  new  code.  Taking  this  factor  into 
account  allows  the  prediction  to  be  made  by  which  accuracy  such  a  code 
wiring  would  recognize  the  word.  The  shape  of  the  curve  also  indicates  the 
number  of  voicings  that  may  be  needed  to  obtain  results  of  a  required 
accuracy.  Summing  up  the  data  presented  by  Fig.  5  for  the  word  "box" 
spoken  32  times:  It  can  be  said  that  with  the  code  Alt.  1  as  shown,  which 
fits  31  of  the  32  voicings  made  and  considering  that  the  data  is  extrapolated 
to  be  31/32  complete,  the  word  "box"  can  be  recognized  947„  of  the  time;  that 
is,  provided  that  the  code  shown  by  Alt.  1  is  mutually  exclusive  with  respect 

to  any  other  that  is  to  be  also  recognized.  About  the  word  "box"  it  can 

also  be  noted  that  with  the  appaiatus  connected  per  setup  101  all  32 
voicings  indicate  "fast  growth"  and  "fast  decay".  When  the  parameters  of 
growth  alone  are  considered  the  results  can  be  plotted  for  all  50  words 

of  the  PB-50  List  1  spoken  by  GMS  in  setup  101  as  shown  in  Fig.  6.  In  this 

tabulation  the  words  are  listed  in  order  of  the  frequency  of  occurrences 
of  the  "fast  growth"  indications  in  the  32  voicings  of  each  word.  This 
is  not  the  order  in  which  these  words  were  spoken.  They  were  spoken  as 
shown  on  PB-50  List  1.^  From  Fig.  6  it  can  be  seen  that  words  beginning 

with  "n",  "s'1,  never  indicated  "fast  growth"  and  with  "r"  very  seldom 

10.  American  Standard  for  Measurement  of  Monosyllabic  Word  Intelligibility 
S3. 2  PB-50  List  1. 
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(3  times  in  160  voicings)  whereas  words  beginning  with  "d"  nearly  always 
indicated  "fast  growth"  (126  out  of  128  times).  For  other  sounds  there  is 
some  trend  but  also  variations.  From  the  distribution  of  the  indications 
which  appears  symetrical  it  can  be  assumed  that  the  choice  of  threshold  is 
in  a  useful  range.  A  similar  graph  was  made  for  the  decay  indication  for 
the  same  voicings  as  is  shown  by  Fig.  7.  There  the  distribution  does  not 
appear  as  symetrical  and  an  increase  in  sensitivity  suggests  itself.  The 
sounds  ending  the  words  do  not  appear  consistently  indicated  when  considered 
out  of  context  but  when  the  preceding  sound  is  included  the  indications 
make  better  sense.  For  instance  a  word  ending  in  "t"  preceded  by  a  vowel 
like  "rat",  "not",  "wheat",  tends  to  show  fast  decay  but  when  a  semi-vowel 
like  "n"  precedes  the  final  "t"  like  in  "hur.t",  there  is  little  tendency 
to  indicate  a  "fast  decay". 

The  results  of  tests  using  other  speakers  were  similarly  recorded  and 
studied.  Many  more  of  those  could  be  reported  and  many  additional  studies 
could  be  made  from  the  data  on  hand,  but  it  is  deemed  more  desirable  to  test 
other  methods  for  which  the  instrumentation  has  already  been  provided. 
However,  one  of  the  results  is  worth  mentioning.  When  the  same  50  words 
are  spoken  by  a  different  speaker  the  order  of  frequency  of  occurrence 
of  indication  is  not  the  same.  Fart  of  tlii  is  obviously  due  to  the 
speakers  but  some  variations  are  due  to  the  nature  of  the  detection  as 
performed.  It  does  not  distinguish  between  continuous  and  interrupted 
growth,  etc.,  it  is  also  amplitude  sensitive. 

A  graphic  presentation  of  the  performance  of  the  growth  detector  for 
two  different  speakers  is  shown  by  Fig.  8.  The  words  are  listed  in  the 
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order  of  frequency  of  occurrence  for  one  speaker  which  is  the  same  as  shown 
in  Fig.  6.  The  performance  of  the  second  speaker  is  superimposed.  (The 
first  speaker  GMS  was  born  in  Trenton,  the  second  RdS  was  born  in  Spain.) 

E.  Tests  Using  the  NATO  Phonetic  Alphabet 

A  phonetic  alphabet  is  an  appropriate  set  of  words  for  a  limited 
vocabulary  speech  recognition  machine.  The  present  international  phonetic 
alphabet  consists  of  26  words  assigned  to  each  letter  of  the  common  Roman 
alphabet.  Two  words  arc  in  a  grammatical  sense  monosyllabic,  two  are  three 
syllable  words  and  the  remaining  22  are  two  syllable  words.  Although  the 
analyzer  used  was  primarily  designed  to  process  one  syllable  words,  previous 
work  has  shown  that,  in  some  cases,  words  of  more  than  one  syllable  could 
be  processed  and  recognized. 

With  these  considerations  in  mind,  it  was  decided  to  make  the  first 
series  of  tests  using  the  phonetic  alphabet  by  speaking  the  words  directly 
into  the  machine  without  syllabicating,  i.e. ,  letting  the  machine  analyze 
the  whole  utterance  as  one  machine  syllable.  This  gave  good  results, 
except  in  the  case  of  two  tri-syllabic  words  ("November"  and  "Uniform") . 

The  remaining  24  words  could  be  resolved  using  mutually  exclusive  codes 
in  the  same  manner  as  described  in  the  preceding  section. 

This  test  was  made  using  the  rate  of  growth  and  decay  indicators. 

F.  Xntrasyllabic  Pauses 

About  one  year  ago  a  study  was  made  of  the  duration  of  certain  types 
of  intrasyllabic  envelope  pauses,  which  suggested  the  possibility  of  including 
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this  information  to  extend  the  resoLving  power  of  a  speech  recognition 
machine.  At  that  time  apparatus  was  designed  which  measured  accurately 
(+  1  ms)  the  duration  of  intrasyllabic  pauses  in  sound  combinations  "SK", 
"ST",  and  "SP"  (as  in  "school",  "stand",  "spy").  Tests  conducted  with  four 
speakers  (one  of  whom  was  female)  and  nearly  two  thousand  voicings  showed 
that  there  is  a  significantly  longer  duration  in  the  pause  preceding  the 
"P"  than  in  those  preceding  the  "K"  and  "X".  This  difference  was  present 
in  the  case  of  all  four  speakers. 

Thus,  a  device  which  detects  an  evelope  pause  and  indicates  whether 
it's  duration  exceeds  a  predetermined  amount  can  be  used  to  recognize  the 
presence  of  the  sound  "SP"  which  otherwise  with  Phonetic  Typewriter  XXX 
may  not  be  distinguished  from  "ST"  or  "SK", 

This  device  could  be  introduced  in  various  parts  of  the  system;  it 
is  more  convenient  to  extract  pause  duration  information  after  the  spectrum 
has  been  digitized.  The  duration  measuring  and  logic  circuits  were 
developed  into  a  unit  which  at  the  end  of  the  reporting  period  was  built 
but  not  yet  tested. 

G.  Status  and  Summary 

To  review  the  progress  and  status  of  this  contract  in  accordance  with 
the  Signal  Corps  Technical  Requirements  SCL-4303  and  under  the  same  headings 
the  following  can  be  stated: 

1.  Investigation 

Studies  were  made  of  characteristics  of  the  speech  envelope.  All 
the  tests  for  which  apparatus  was  setup  have  not  yet  been  completed. 
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Indications  of  the  usefulness  of  the  approach  were  obtained.  The  work 
done  is  described  in  more  detail  in  the  preceding  section. 

2.  Feasibility  Model 

The  assembly  of  laboratory  apparatus  required  to  demonstrate  the 
feasibility  of  the  system  is  substantially  complete.  The  syllable 
speech  synthesizer,  originally  scheduled  for  construction  under  this 
contract,  was  completed  by  RCA  with  its  own  funds  in  advance  of  the 
contract  award.  Work  remains  to  be  done  on  minor  items  such  as  code 
converters  and  of  course  whatever  changes  that  may  have  to  be  made  as 
a  result  of  improvements  to  the  speech  processing  apparatus  itself. 

3.  Vocabulary 

The  apparatus  made  available  for  this  work  has  the  capacity  to 
store  96  words  in  its  spelling  memory  with  seven  typing  functions  for 
each  word.  At  present  75  words  are  stored  in  this  memory.  The  speech 
synthesizer  has  the  capacity  to  recall  50  words  or  syllables  from  the 
50  track  magnetic  drum  memory.  It  is  presently  charged  to  capacity. 
The  analyzer  and  syllable  memory  is  now  setup  to  recognize  52  words 
in  2  languages,  that  is  46  English  and  6  French  words  if  spoken  by 
one  speaker  (HB) .  A  second  speaker  of  different  national  origin  (RdS) 
can  work  24  of  these  words  and  a  third  speaker  of  still  different 
origin  (GMS,  Trenton,  N.  J.)  can  work  about  16  words.  There  is 
capacity  left  in  the  syllable  memory  for  additional  words,  the  exact 
number  depends  upon  the  complexity  of  the  codes  to  be  stored. 
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The  vocabulary,  now  in  the  syllable  memory,  was  originally  selected 
from  lists  of  most  frequently  occurring  words,  to  which  others  were 
added  to  make  demonstration  sentences.  Some  words  are;  left  over  from 
certain  series  of  tests  on  some  specific  aspect  of  recognition.  In 
general,  the  words  were  chosen  for  recognition  by  machine,  as  spoken 
chiefly  by  one  speaker  (HB)  for  whom  most  data  was  available.  To 
increase  the  vocabulary  of  the  machine  mere  addition  to  the  memory 
capacity  is  not  enough,  however,  tests  can  be  made  without  actually 
wiring  up  memory  circuits.  The  results  of  these  tests  can  be  made  to 
indicate  whether  or  not  the  displays  obtained  are  mutually  exclusive 
from  those  of  other  words,  and  therefore  possible  of  recognition  if 
so  wired. 

To  test  the  operation  of  the  apparatus  with  other  than  the  original 
vocabulary,  words  were  selected  in  conformance  with  directions  received, 
namely  to  use  words  of  military  significance,  and  to  use  word  lists 
that  are  phonetically  balanced.  Pending  an  agreement  on  the  vocabulary 
to  be  used  for  the  demonstration  of  the  feasibility  model  preliminary 
selection  of  words  were  made  by  the  contractor.  One  list  contained 
14  words  of  military  significance  which  when  added  to  some  of  the 
words  now  in  the  vocabulary  would  allow  demonstration  of  such  massages. 
For  phonetically  halanced  lists,  the  PB-50  List  1  from  the  American 
Standard  was  chosen.  Data  was  collected  for  3  speakers  with  the  object 
to  establish  a  bench  mark  for  the  operation  of  the  equipment  and  to 
establish  a  starting  point  from  which  progress  could  be  measured. 


The  test  with  the  phonetically  balanced  list  of  words  made  by  one 
speaker,  who  Voiced  each  of  the  50  words  32  times  for  a  total  of  1600 
phonations,  resulted  in  data  which  was  analyzed.  Two  successive  trials 
were  made  to  derive  mutually  exclusive  codes  or  sets  of  numbers  or 
displays  which  when  stored  in  a  syllable  memory  would  recognize  these 
words.  It  was  found  possible  to  specify  codes  or  sorting  procedures 
that  would  recognize  with  varying  accuracy  49  of  those  50  words  as 
spoken  by  that  speaker.  Two  words  sounded  alike  to  the  machine  and 
could  not  be  resolved.  A  similar  test  was  made  with  the  phonetic 
alphabet . 

4.  Speakers 

The  first  series  of  tests  were  made  by  the  same  3  speakers  that 
have  been  associated  with  this  development  before  the  start  of  this 
contract.  Tentative  plans  have  been  made  to  include  Signal  Corps 
personnel. 

5.  Transmission  Rate 

The  information  is  transmitted  at  present  from  the  analyzer  to 
the  typing  unit  by  a  single  connection  to  the  respective  circuit, 
each  word  having  its  own  circuit.  The  transmission  of  50  words  to 
the  speech  synthesizer  is  a  6  bit  code  transmitted  over  6  wires  plus 
one  return.  Equipment  for  the  transmission  of  this  code  into  a 
"serial"  bit  code  that  can  be  transmitted  over  2  wires  has  been 
partialLy  designed  and  constructed. 
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6,  Intelligibility 

The  accuracy  of  recognition  by  the  present  apparatus  is  better 
than  the  goal  of  85%  given  in  the  technical  requirement,  for  the  words 
chosen  to  suit  the  machine  and  with  the  machine  set  up  for  a  particular 
speaker.  It  is  lower  than  desired  for  other  word  lists  such  as  the 
50  words  of  PB-50  tested  by  a  different  speaker.  The  apparatus, 
however,  can  handle  the  majority  of  such  words  and  should  meet  the 
specifications  with  a  relatively  small  amount  of  editing  of  the  word 
list. 
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V.  CONCLUSIONS 


From  many  of  the  tests  performed  with  growth  and  decay  detectors  it  can  be 
concluded  that  information  useful  for  the  recognition  of  speech  by  machine  can 
be  extracted  by  such  means.  Setups  have  been  provided  to  try  other  methods  of 
detection  and  it  is  felt  that  these  should  be  tested  before  making  final 
conclusions . 

The  apparatus  as  now  constituted  more  than  meets  the  requirements  for 
accuracy  (85%)  and  vocabulary  (40-50  words)  set  for  the  feasibility  model; 
provided  that  the  vocabulary  is  chosen  to  suit  the  machine  and  the  machine  is 
setup  to  perform  with  one  specified  speaker.  It  was  known  that  mere  addition 
Lo  the  memory  was  not  enough  to  add  to  the  vocabulary  and  that  it  would  also 
require  additional  resolving  power  in  the  speech  processing  analyzer.  A  major 
research  effort  is  still  needed  in  this  direction  toward  a  full  scale  machine 
but,  from  the  tests  made,  it  was  learned  that  the  performance  of  the  present 
apparatus  was  better  than  expected  when  tested  with  a  different  vocabulary  and 
speaker.  Indications  are  that  most  of  the  words  from  a  limited  vocabulary  list, 
like  the  phonetically  balanced  word  lists,  can  be  handled  by  the  machine. 
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VI.  PROGRAM  FOR  THE  NEXT  INTERVAL 


It  is  planned  to  continue  tests  with  apparatus  for  the  growth  and  decay 
detection  by  differentiation  of  the  rectified  envelope  to  optimize  its  performance. 
To  make  tests  with  the  apparatus  set  up  to  detect  the  rate  of  growth  and  decay 
by  measurement  of  the  time  interval  between  reaching  different  levels.  Tests 
will  be  made  with  different  frequency  bands  of  the  speech  signal  and  different 
speakers.  In  order  to  facilitate  the  optimizing  process  some  of  the  speakers 
will  be  recorded. 

It  is  also  intended  to  complete  the  setup  of  the  intrasyllabic  pause 
detector  and  make  tests  with  it  using  the  same  read  out  facilities. 

The  tests  will  include  calibration  of  the  duration  measurements  and 
adjustments  to  compensate  the  inevitable  variations  in  the  operating  and 
releasing  times  of  the  spectral  memory  relays. 

Provisions  have  been  made  to  make  use  of  envelope  pauses  in  phonemic 
environments  other  than  those  starting  with  the  fricative  "5". 

A  list  furnished  by  IEA5RDL.  of  words,  some  of  which  may  be  problem  words 
by  being  very  closely  related  phonetically,  will  be  tested  and  the  results 
analyzed. 

A  study  will  be  undertaken  of  ways  to  recognize  certain  sounds  out  of 
context  such  as  the  starting  sound  "s"  or  "f"  occurring  as  in  "fee"  and  "see" 
and  of  incorporating  this  information  in  the  display  for,  and  recognition  of, 
a  syllable. 

Other  classifications  of  envelope  characteristics  by  machine  will  be  sought 
such  as  envelopes  with  interrupted  growth  and  decay  characteristics  or  envelopes 
with  more  than  one  maximum  amplitude. 
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VII.  IDENTIFICATION  OF  KEY  TECHNICAL  PERSONNEL 


H.  Belar  -  Project  Supervisor  HO-l/2  hours 
R.  de  Sobrino  -  Project  Engineer  199-1/2  hours 
E.  S.  Rogers  -  Engineer  0 

Brief  descriptions  of  the  background  of  key  technical  personnel  involved  in 
the  work  follows: 

H.  BELAR 

Hr,  Belar  is  a  Graduate  of  Naval  Academy  of  Austria-Hungary,  1918.  Mr.  Belar 
is  a  Fellow  in  the  Acoustical  and  Electromechanical  Research  Laboratory  at 
the  David  Sarnoff  Research  Center,  Princeton,  N.  J.  and  has  worked  for  RCA  for  34  year a 
He  has  made  major  contributions  to  the  field  of  speech  and  music  analysis  and 
synthesis  and  holds  27  U.  S.  patents.  He  is  the  co-author  of  many  published 
papers  in  the  field  of  speech  and  music  analysis  and  synthesis  and  acoustics  in 
general. 

R.  de  SOBRINO 

Dr.  de  Sobrino  holds  the  degress  of  EE,  Spanish  Navy;  MEE,  Brooklyn  Poly tech. ; 

D.  Eng.  Sc.,  Columbia  University.  He  has  wdrked  on  communication  equipment  for 
the  Spanish  Navy  since  1948,  mainly  in  the  equivalent  of  the  Bureau  of  Naval 
Research  in  Madrid.  He  spent  a  year  at  Marconi  Espanola,  Madrid,  organizing 
a  radar  testing  laboratory  and  another  year  at  the  Instituto  Nacional  de 
Industria  also  in  Madrid,  working  on  the  preliminary  plans  to  install  Nuclear 
Power  Stations  in  Spain.  In  April,  1959,  Dr.  de  Sobrino  joined  RCA  Laboratories, 
Princeton,  N.  J.,  where  he  is  currently  working  in  the  Acoustical  and  Electro¬ 
mechanical  Research  Laboratory  on  speech  analysis. 
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E.  S.  ROGERS 


Mr.  Rogers  received  the  B.A.  degree  in  Mathematics  and  Physics  from 
Susquehanna  University  in  1.942  and  the  M.S.  degree  in  physics  from  Case  Institute 
of  Technology  in  1943.  From  1943  to  1945  he  served  with  Columbia  University 
Division  of  War  Research,  National  Defense  Research  Council  and  the  USN 
Underwater  Sound  Reference  Laboratory  engaged,  in  Underwater  Acoustics.  In  1945 
he  joined  the  staff  of  RCA  Laboratories  in  Princeton,  N.  J.  His  work  has  been 
primarily  in  the  field  of  acoustics  specializing  in  underwater  sound,  ultrasonics, 
speech  analysis,  noise  reduction,  and  solid  state  electromechanical  transducers. 
Specifically,  he  has  spent  the  last  four  years  on  the  articulation  of  speech 
pickup  in  rooms,  electronic  noise  reducing  systems  for  improving  the 
intelligibility  of  speech  in  the  presence  of  noise  and  formant  trackers  of  speech. 
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Fig. 5.  Codes  of  "Box", spoken  by  G.M.S. 
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In  order  of  frequency 
of  Occurence  of 
"Fast  Decay"  indication. 
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